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Abstract. Given a compact parameter set Y C MP, we consider polynomial 
optimization problems (Py) on whose description depends on the parame- 
ter y S Y. We assume that one can compute all moments of some probability 
measure (/3 on Y, absolutely continuous with respect to the Lebesgue mea- 
sure (e.g. Y is a box or a simplex and ip is uniformly distributed). We then 
provide a hierarchy of semidefinite relaxations whose associated sequence of 
optimal solutions converges to the moment vector of a probability measure 
that encodes all information about all global optimal solutions x*(y) of Py, 
as y S Y. In particular, one may approximate as closely as desired any poly- 
nomial functional of the optimal solutions, like e.g. their ip-mean. In addition, 
using this knowledge on moments, the measurable function y i— > a;^ (y) of the 
fc-th coordinate of optimal solutions, can be estimated, e.g. by maximum 
entropy methods. Also, for a boolean variable Xk, one may approximate as 
closely as desired its persistency ip{{y : a;^(y) = 1}, i.e. the probability that 
in an optimal solution x*(y), the coordinate x'^{y) takes the value 1. At last 
but not least, from an optimal solution of the dual semidefinite relaxations, 
one provides a sequence of polynomial (resp. piecewise polynomial) lower ap- 
proximations with Li{ip) (resp. almost uniform) convergence to the optimal 
value function. 



1. Introduction 

Roughly speaking, given a set parameters Y and an optimization problem whose 
description depends on y G Y (call it Py), parametric optimization is concerned 
with the behavior and properties of the optimal value as well as primal (and pos- 
sibly dual) optimal solutions of Py, when y varies in Y. This a quite challenging 
problem and in general one may obtain information locally around some nominal 
value yo of the parameter. There is a vast and rich literature on the topic and for 
a detailed treatment, the interested reader is referred to e.g. Bonnans and Shapiro 
[4] and the many references therein. Sometimes, in the context of optimization 
with data uncertainty, some probability distribution ip on the parameter set Y is 
available and in this context one is also interested in e.g. the distribution of the 
optimal value, optimal solutions, all viewed as random variables. In particular, for 
discrete optimization problems where cost coefficients are random variables with 
joint distribution if, some bounds on the expected optimal value have been ob- 
tained. More recently Natarajan et al. [T7j extended the earlier work in [3] to even 
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provide a convex optimization problem for computing the so-called persistency val- 
ue^ of (discrete) variables, for a particular distribution ip* in a certain set O of 
distributions. However, this convex formulation requires knowledge of the convex 
hull of a discrete set and approximations are needed. The approach is nicely illus- 
trated on a discrete choice problem and a stochastic knapsack problem. For more 
details on persistency in discrete optimization, the interested reader is referred to 
[TT] and the references therein. 

In the context of polynomial equations whose coefficients are themselves poly- 
nomials of some parameter y G Y, some specific "parametric" methods exist. For 
instance, one may compute symbolically once and for all, what is called a compre- 
hensive Grobner basis, i.e., a fixed basis that is a Grobner basis for all y G Y; see 
Weispfenning '25] and more recently Rostalski fW for more details. Then when 
needed, one may compute the solutions for a specific value of the parameter y, e.g. 
by the eigenvalue method of MoUer and Stetter [inilH]. However, one still needs 
to apply the latter method for each value of the prameter y. A similar two-step 
approach is also proposed for homotopy (instead of Grobner bases) methods in [19j . 

The purpose of this paper is to show that in one restricts to the case of polynomial 
parametric optimization then all information about the optimal value and optimal 
solutions can be obtained, or at least, approximated as closely as desired. 

Contribution. We here restrict our attention to parametric polynomial optimiza- 
tion, that is, when Py is described by polynomial equality and inequality constraints 
on both the parameter vector y and the optimization variables x. Moreover, the 
set Y is restricted to be a compact basic semi- algebraic set of MP, and preferably 
a set sufficiently simple so that one may obtain the moments of some probability 
measure on Y, absolutely continuous with respect to the Lebesgue measure. For 
instance if Y is a simple set (like a simplex, a box) one may choose (p to be the prob- 
ability measure uniformly distributed on Y; typical Y candidates are polyhedra. 
Or sometimes, in the context of optimization with data uncertainty, f is already 
specified. We also suppose that Py has a unique optimal solution for almost all 
values of the parameter y G Y. In this specific context we are going to show that 
one may get insightful information on the set of all global optimal solutions of Py, 
via what we call a " Joint+marginal" approach. Our contribution is as follows: 

(a) Call J(y) (resp. X* G M") the optimal value (resp. the set of opti- 
mal solutions) of Py for the value y G Y of the parameter. We first define 
an infinite-dimensional optimization problem P whose optimal value is exactly 
p = Jy J{'y)d^{'y)- Any optimal solution of Py is a probability measure /i* on 
M" X W' with marginal on R''. It turns out that fi* encodes all information on 
the optimal solutions X*, y G Y. Whence the name "Joint+marginal" as /z* is a 
joint distribution of x and y, and p is the marginal of ii* on W. 

(b) Next, we provide a hierarchy of semidefinite relaxations of P with associated 
sequence of optimal values {pi)i, in the spirit of the hierarchy defined in [13| . An 
optimal solution of the z-th semidefinite relaxation is a sequence = indexed 
in the monomial basis (x'^y'') of the subspace M[x, y]i of polynomials of degree 



Given a — 1 optimization problem maxjc'x : x g A" n {0, 1}"} and a distribution ip on c, 
the persistency value of the variable Xi is Prob,^(a:| = 1) at an optimal solution x*(c) = (x*). 
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at most 2i. If for almost all y G Y, Py has a unique global optimal solution 
x*(y) G M", then as « ^ oo, converges pointwise to the sequence of moments of 
/i* defined in (a). In particular, one obtains the distribution of the optimal solution 
x*(y), and therefore, one may approximate as closely as desired any polynomial 
functional of the solution x*(y), like e.g. the t/j-mean or variance of x*(y). 

In addition, if the optimization variable Xk is boolean then one may approximate 
as closely as desired its persistency ip{{y : xl.{y) = 1} (i.e., the probability that 
^fc(y) = 1 in an optimal solution x*(y)), as well as a a necessary and sufficient 
condition for this persistency to be 1. 

(c) Finally, let e{k) G N" be the vector {Sj=k)j- Then as i ^ oo, and for 
every /3 G W, the sequence (Zg^^.)^) converges to zj!^ J^y'^ gk{y)dip{y) for the 
measurable function y i— >• (?fe(y) := a;^(y). In other words, the sequence (2^fc^)/3GNp 
is the moment sequence of the measure dip{y) := x1{y)d(p{y) on Y. And so, the 
fc-th coordinate function y ^ ^kiv) optimal solutions of Py, y G Y, can be 
estimated, e.g. by maximum entropy methods. Of course, the latter estimation 
is not pointwise but it still provides useful information on optimal solutions, e.g. 
the shape of the function y i-^ xl{y), especially if the function x^(-) is continuous, 
as illustrated on some simple examples. For instance, for parametric polynomial 
equations, one may use this estimation of x*(y) as an initial point for Newton's 
method for any given value of the parameter y. 

Finally, the computational complexity of the above methodology is roughly the 
same as the moment approach described in |13j for an optimization problem with 
n -\-p variables since we consider the joint distribution of the n variables x and the 
p parameters y. Hence, the approach is particularly interesting when the number 
of parameters is small, say 1 or 2. In addition, in the latter case the max-entropy 
estimation has been shown to be very efficient in several examples in the literature; 
see e.g. HSl Hi]. However, in view of the present status of SDP solvers, if no 
sparsity or symmetry is taken into account as proposed in e.g. [14j . the approach 
is limited to small to medium size polynomial optimization problems. 

But this computational price may not seem that high in view of the ambitious 
goal of the approach. After all, keep in mind that by applying the moment approach 
to a single (n + p)-variables problem, one obtains information on global optimal 
solutions of an n-variables problem that depends on p parameters, that is, one 
approximates n functions of p variables! 

2. A RELATED LINEAR PROGRAM 

Let R[x, y] denote the ring of polynomials in the variables x ~ {xi, . . . , Xn), and 
the variables y — (yi, . . . , i/p), whereas M[x, y]k denotes its subspace of polynomials 
of degree at most k. Let I][x, y] C ]R[x, y] denote the subset of polynomials that 
are sums of squares (in short s.o.s.). For a real symmetric matrix A the notation 
A >^ stands for A is positive semidefinite. 

The parametric optimization problem. Let Y C be a compact set, called 
the parameter set, and let /, hj : K." x ^ M, j = 1, . . . , m, be continuous. For 
each y G Y, fixed, consider the following optimization problem: 



(2.1) 



J(y) := inf { /y(x) : hyj{x) > 0, j = 1, . . . , m } 
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x^/y(x) /(x,y) 

x^hyj{x.) ft,j(x,y), j = 1,...,TO 



where the functions /y, /lyj : M" ^ M are defined via: 

Vx e M", Vy e MP. 

J — ^: ■ ■ ■ J '"- J 

Next, let K C K" X W be the set: 

(2.2) K := {(x,y) : y e Y; ;i,(x,y)>0, j-f,...,m}, 
and for each y e Y, let 

(2.3) Ky := {xeM" : /lyj(x) > 0, j = l,...,m}. 

The interpretation is as follows: Y is a set of parameters and for each instance 
y G Y of the parameter, one wishes to compute an optimal decision vector x*(y) 
that solves problem (|2.1|) . Let (p he a Borel probability measure on Y, with a 
positive density with respect to the Lebesgue measure on MP. For instance choose 
for if the probability measure 



ip{B) := ( f dy] f dy, VB 
\Jy / JYnB 



uniformly distributed on Y. Sometimes, e.g. in the context of optimization with 
data uncertainty, is already specified. 

We will use (p (or more precisely, its moments) to get information on the distri- 
bution of optimal solutions x*(y) of Py, viewed as random vectors. 



In the rest of the paper we assume that for every y G Y, the set Ky in (|2.3p is 
nonempty. 

2.1. A related infinite-dimensional linear program. Let M(K) be the set of 

finite Borel measures on K, and consider the following infinite-dimensional linear 
program P: 



(2.4) 



where tt/i denotes the marginal of on M^, that is, 7r/i is a probability measure on 
W defined by 

TTniB) := n{W X B), VB £ BiW). 

Notice that /i(K) = 1 for any feasible solution /i of P. Indeed, as is a probability 
measure and nfi — (p one has 1 — tpiY) = /i(R" x W) — /i(K). 

Recall that for two Borel spaces X,Y, the graph Grip C X x y of a set- valued 
mapping t/j : X ^ Y is the set 

GrV {(x,y) : xGX; ye^(x)}. 

If tfj is measurable then any measurable function h : X Y with h{x) e ^/'(x) for 
every x G X, is called a (measurable) selector. 



Lemma 2.1. Let both Y C M" and K in \2. 2\) be compact. Then the set-valued 
mapping y Ky is Borel-measurable. In addition: 

(a) The mapping y > J{y) is measurable. 

(b) There exists a measurable selector g :Y ^ Ky such that J(y) = /(5(y)jy) 
for every y G Y. 
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Proof. As K and Y are both compact, the set valued mapping y ^ Ky C K" is 
compact-valued. Moreover, the graph of Ky is by definition the set K, which is 
a Borel subset of R" x M.P. Hence, by [TT, Proposition D.4], Ky is a measurable 
function from Y to the space of nonempty compact subsets of R" , topologized by 
the Hausdorff metric. Next, since x i— > /y(x) is continuous for every y e Y, (a) 
and (b) follows from e.g. [HI Proposition D.5]. □ 

Theorem 2.2. Let both Y C and K in be compact and assume that for 

every y G Y, the set Ky C R" in \2. is nonempty. Let P be the optimization 
problem (2^ and let X* := {x G M" : /(x,y) = J(y)}, y G Y. Then: 

(a) p = J J(y) dip{'y) and P has an optimal solution. 

(b) For every optimal solution ^i* of P, and for almost all y G Y, there is a 
probability measure ip*{dx \ y) on X* such that: 

(2.5) fi*{CxB)=[ r{CnX;\y)d^{y), VB e B(W), C e B{R^). 
JsnY 

(c) Assume that for almost y G Y, the set of minimizers o/X* is the singleton 
{x*(y)} for some x*(y) G Ky. Then there is a measurable mapping g : Y — ^ Ky 
such that 

(2-6) g{y) = x*(y) for every yGY; P ^ J /(^(y), y) c^¥'(y), 
and for every a G N", and /3 G N^.' 

(2.7) / xVdA**(x,y) = / y^g(y)"d^(y). 

Proof, (a) As K is compact then so is Ky for every yGY. Next, as Ky ^ for 
every yGY and / is continuous, the set X* := {x G R" : /(x, y) = J{y)} is 
nonempty for every yGY. Let p be any feasible solution of P and so by definition, 
its marginal on R'' is just (p. Since X* ^ 0, Vy G Y, one has /y(x) > J(y) for all 
X G Ky and all yGY. So, /(x, y) > J(y) for all (x, y) G K and therefore 

f fd^l> f J(y) dp^ f J(y) dip, 
Jk Jk Jy 

which proves that P — J ^iv) '^f- 

On the other hand, recall that Ky 7^ 0, Vy G Y. Consider the set-valued mapping 
y I— > X* C Ky. As / is continuous and K is compact, then X* is compact-valued. 
In addition, as fy is continuous, by 11, D6] (or [20) there exists a measurable 
selector g : Y ^ X* (and so /(.g(y),y) = J{y))- Therefore, for every yGY, let 
Tp* be the Dirac probability measure with support on the singleton g{y) G X* , and 
let n be the probability measure on K defined by: 



fi{C, B) := / lc(.9(y)) V{dy), VB G B{W), C G 6(R"). 

JB 
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(The measure /x is well-defined because g is measurable.) Then jj, is feasible for P 
and 



/(X,y) dSg^y 



/(5(y),y)rf^(y) = / J{y)dv{y), 



/k 



which shows that /x is an optimal solution of P and p = J{y)d(p{y). 

(b) Let n* be an arbitrary optimal solution of P, hence supported on Ky x Y. 
Therefore, as K is contained in the cartesian product MP x M", the probability 
measure /i* can be disintegrated as 

fi*{C,B) := [ ilj*{CnKy\y)dip{y), VB e B{W), C e B{M."), 
JBnY 

where for all y e Y, ?/'*(• | y) is a probability measure on Ky. (The object 
is called a stochastic kernel; see e.g. [S] p. 88-89] or [TTl D8].) Hence from (a), 



P = y^J(y)d^(y) 



/ /(x,y)fi/x*(x,y) 
Jk 

/(x,y)'0*(dx|y) difiy). 



Y \JK 



Therefore, using /(x,y) > J(y) on K, 

J(y)-/(x,y) r(d^\y) I d^{y), 




<0 



which implies V'*(X*(y) | y) = 1 for almost all y G Y. 

(c) Let g : Y Ky be the measurable mapping of Lemma [2Tllb). As J(y) = 
/(5(y),y) and (.9(y),y) e K then necessarily .g(y) £ X* for every y G Y. Next, 
let fi* be an optimal solution of P, and let a e N", /3 G N^. Then 



/ xVrfM*(x,y) = /y^( / 



x"V*(dx|y)j dip(y) 
y^g(y)"d^(y), 

the desired result. □ 

An optimal solution fi* of P encodes all information on the optimal solutions 
x*(y) of Py. For instance, let B be a given Borel set of R". Then from Theorem 

Prob(x*(y) eB) = fi*{Bx«P) = J^rCB\y)d^{y), 

with ip* as in Theorem I2.2r b) . 

Consequently, if one knows an optimal solution fi* of P then one may evaluate 
functionals on the solutions of Py, y £ Y. That is, assuming that for almost all 
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y G Y, problem Py has a unique optimal solution x*(y), and given a measurable 
mapping h : M" — > M'', one may evaluate the functional 

Mx*(y))d^(y). 

For instance, with x i-^ ft,(x) := x one obtains the mean vector E^(x*(y)) := 
/y x*(y)d(/3(y) of optimal solutions x*(y), y £ Y. 



Corollary 2.3. Let both Y G M.^ and K in S2. 2\) he compact. Assume that for 
every y £ Y, the set Ky C M" in i2.3\) is nonempty, and for almost all y G Y, 
the set X* :~ {x G Ky : J(y) = /(x, y)} is the singleton {x*(y)}. Then for every 
measurable mapping h : R" — > M*, 

(2.8) / h{^*{y))d^{y) = [ /i(x) d/i*(x, y). 

where fi* is an optimal solution of P . 
Proof By Theorem E^fc) 

/ /i(x)dA**(x,y) = / / /^(x)7A*(dx|y) d^{y) = f /j(x*(y)) d^(y). 
Jk Jy ["'xj J Jy 

□ 

2.2. Duality. Consider the following infinite-dimensional linear program P*: 

P* ■— sup / pdip 

(2.9) peR[y] Jy 

/(x,y)-p(y) > V(x,y) gK. 

Then P* is a dual of P. 

Lemma 2.4. Let both Y C and K m i2. 2^) be compact and let P and P* 6e as 
in \2.J^ and 112. 9\} respectively. Then there is no duality gap, i.e., p — p* . 



Proof. For a topological space X denote by C{X) the space of bounded continuous 
functions on X. Let A^(K) be the vector space of finite signed Borel measures 
on K (and so M(K) is its positive cone). Let tt : 7W(K) A^(Y) be defined by 
(7r/x)(B) = p{{K" X B) n K) for aU B G B{Y), with adjoint mapping tt* : C(Y) ^ 
C(K) defined as 

(x,y) ^ (^*/i)(x,y) /i(y), \/h G C(Y). 

Put (|2.4p in the framework of infinite-dimensional linear programs on vector spaces, 
as described in e.g. [T]. That is: 

P = inf : TTp^if, p> 0}, 

AieA^(K) 

with dual: 

p= sup {{h,if) : / — 7r*/i > on K}. 

/i6C(Y) 

One first proves that p = p and then p — p* . 

By [T, Theor. 3.10], to get p = p, it suffices to prove that the set D := 
{{np, (/, p)) : p G M(K)} is closed for the respective weak ★ topologies a{A4(Y) x 
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M, C(Y) X R) and (t(A^(K), C(K)). Therefore consider a converging sequence 
T^^-n — >■ a with /i„ G M(K). The sequence (/in) is uniformly bounded because 

/i„(K) = (7r/i„)(Y) = (l,7r/i„) ^ = a(Y). 

But by the Banach-Alaoglu Theorem (see e.g. [5]), the bounded closed sets of 
M(K) are compact in the weak ★ topology. And so (x for some /i G M(K) 

and some subsequence [rik)- Next, observe that for h G C(Y) arbitrary, 

where we have used that 'K*h £ C(K). Hence combining the above with 7r/i„j, — > a, 
we obtain 7r/i — a. Similarly, (/, Mrifc) (/; because / G C(K). Hence D is 
closed and the desired result p — p follows. 

We next prove that p = p* . Given e > fixed arbitrary, there is a function G 
C(Y) such that f — he > on K and / hedip > p — e. By compactness of Y and the 
Stone- Weierstrass theorem, there is G M[y] such that supygY l^«(y) (y)l ^ ^■ 
Hence the polynomial := Pe — e is feasible with value fyP^dip > p — 3e, and as e 
was arbitrary, the result p — p* follows. □ 

As next shown, optimal or nearly optimal solutions of P* provide us with poly- 
nomial lower approximations of the optimal value function y Jiy) that converges 
to J(-) in the ^1(1^9) norm. Moreover, one may also obtain a piecewise polynomial 
approximation that converges to J(-) almost uniformly. (Recall that a sequence of 
measurable functions ((?„) on a measure space (Y, i3(Y), 1^9) converges to g almost 
uniformly if and only if for every e > 0, there is a set A G B{Y) such that (p{A) < e 
and Qn g uniformly on Y \ A.) 

Corollary 2.5. Let both Y C K^* and K in \2. be compact and assume that for 
every y G Y, the set Ky is nonempty. Let P* be as in \2.9\) . If {pi)iem C R[y] is 
a maximizing sequence of i2. 9]) then 

(2.10) J I J(y) -K(y)|d(^ ^ as i ^ 00. 

Moreover, define the functions (pi) as follows: 

Po-=Pa, y i-^p,;(y) := max[p,_i(y),pj(y)], i = l,2,... 
Then pi —>■ J(-) almost uniformly. 

Proof. By Lemma 12.41 we already know that p* = p and so 

J Pi{y)d(p{y) ^ p* ^ p = J J{y)dip. 

Next by feasibility of pi in (12. 9p 

/(x,y) >p,{y) V(x,y) G K ^ inf /(x,y) = J(y) > p,(y) Vy G Y. 

XGKy 

Hence (I2.10p follows from Pi{y) < J{y) on Y. 

Next, with y G Y fixed, the sequence {pi{y))i is obviously monotone non de- 
creasing and bounded above by J(y), hence with a limit p*{y) < J{y)- Therefore 
pi has the pointwise limit y 1— > p*{y) < J{y)- Also, by the Montone convegence 



PARAMETRIC POLYNOMIAL OPTIMIZATION 



9 



theorem, pi{y)dif{y) J^p*{y)dip{y). This latter fact combined with (|2.10p 
and Pi (y) < Pi{y) < J(y) yields 

- ^(J(y)-p*(y))d^(y), 

which in turn implies that p*{y) = Jiy) for almost all y G Y. Therefore Pi{y) 
J(y) for almost aU y G Y. And so, by Egoroff's Theorem [21 Theor. 2.5.5], 
Pi ^ Ji') almost uniformly. □ 

3. A HIERARCHY OF SEMIDEFINITE RELAXATIONS 

In general, solving the infinite-dimensional problem P and getting an optimal 
solution fi* is impossible. One possibility is to use numerical discretization schemes 
on a box containing K; see for instance |12] . But in the present context of paramet- 
ric optimization, if one selects finitely many grid points (x, y) G K, one is implicitly 
considering solving (or rather approximating) Py for finitely many points y in a 
grid of Y, which we want to avoid. To avoid this numerical discretization scheme 
we will use specific features of P when its data / (resp. K) is a polynomial (resp. 
a compact basic semi-algebraic set). 

Therefore in this section we are now considering a polynomial parametric opti- 
mization problem, a special case of (|2.ip as we assume the following: 

• / G IR.[x, y] and hj G M[x, y], for every j = 1, . . . , to. 

• K is compact and Y C is a compact basic semi- algebraic set. 

Hence the set K C M" x MP in (|2.2p is a compact basic semi-algebraic set. We 
also assume that there is a probability measure ip on Y, absolutely continuous with 
respect to the Lebesgue measure, whose moments 7 = (7/3), /? G W, are available. 
As already mentioned, if Y is a simple set (like e.g. a simplex or a box) then one 
may choose if to be the probability measure uniformly distributed on Y, for which 
all moments can be computed easily. Sometimes, in the context of optimization 
with data uncertainty, the probability measure (p is already specified and in this 
case we assume that its moments 7 = (7/3), /? G N^, are available. 

3.1. Notation and preliminaries. Let Nf := {a G N" : |a| < i} with \a\ = 

J2i Oil- With a sequence z — (zq/j), a G N", (5 G W, indexed in the canonical basis 
(x^y'^) of K[x, y], let Lz : E[x, y] ^ M be the linear mapping: 

/ H'^"/3(x,y)) ^ L^if) := ^/a/s^a/s, /GM[x,y]. 

a/3 Q/3 

Moment matrix. The moment matrix Mi(z) associated with a sequence z = (Zq/j), 
has its rows and columns indexed in the canonical basis (x" y''), and with entries. 

M,(z)(a,/3),(<5,7)) - ^.(x^y'^xV) = ^(a+5)(/3+7): 
for every a, (5 G N" and every 7 G . 

Localizing matrix. Let q be the polynomial (x, y) >—>■ q(Ti, y) := ^ (7„t,x^y^. The 
localizing matrix 'M.i{q z) associated with q G IR.[x, y] and a sequence z = (z^^), has 
its rows and columns indexed in the canonical basis (x" y''), and with entries. 

M,(gz)(a,/3),(<5,7)) = iz(9(x, y)x"y'' a; V) 
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for every a, (5 £ N" and every /3, 7 e . 

A sequence z — (zap) C M has a representing finite Borel measure supported on 
K if there exists a finite Borel measure fj, such that 

Zc^p = / x" y'^ dfi, Va e N", /3 G N^. 
Jk 

The next important result states a necssary and sufhcient condition when K is 
compact and its defining polynomials (hk) C M[x, y] satisfy some condition. 

Assumption 3.1. Let (hj)*^-^ C M[x, y] be a given family of polynomials. There 
is some N such that the quadratic polynomial (x, y) 1-^ N — ||(x,y)||^ can be written 

t 

^- ll(x,y)f = ao + ^a,/z„ 
for some s.o.s. polynomials (crj)*^x ^[^jY]- 

Theorem 3.2. Let K := {(x, y) : /ife(x, y) > 0, j — 1, . . .t} and let {hkYi^^i satisfy 
Assumvtion \ 3.1l A sequence z = {zap) has a representing measure on K if and 
only if: 

M,{z) h 0; M,{hkz) hO, k^O,...,t. 

Theorem [32] is a direct consequence of Putinar's Positivstellensatz jTHl and [21]. 
Of course, when Assumption 13 . 1 1 holds then K is compact. On the other hand, if K 
is compact and one knows a bound N for || (x, y) || on K then its suffices to add the 
redundant quadratic constraint ht+i{x,y){:— N'^ — ||(x,y)|p) > to the definition 
of K, and Assumption 13.11 holds . 

3.2. Semidefinite relaxations. To compute (or at least, approximate) the op- 
timal value p of problem P in (|2.4p . we now provide a hierarchy of semidefinite 
relaxations in the spirit of those defined in [13] . 

Let K C R" X IRP be as in ((Ml), and let Y C M^' be the compact semi- algebraic 
set defined by: 

(3.1) Y := {yeW : hk{y) > 0, fc = m + l,...,t} 

for some polynomials {hk)k=m+i £ R[y]; let Vk ■= [(deg /ifc)/2]] for every k = 
1, . . . , t. Next, let 7 — (7/3) with 

7/3 -^y" My), V/3eN^ 

be the moments of a probability measure (p on Y, absolutely continuous with respect 
to the Lebesgue measure, and let iq := max[[(deg /)/2] , rnax^, 7;^]. For i > ig, 
consider the following semidefinite relaxations: 



(3.2) 



inf Lz(/) 
s.t. M,(z) t 

M,_,^.(/ijz) ^ 0, j = l,. 
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(3.3) lim = / ^(y)" d^(y), Va G N", /? £ . 
In particular, for every k = 1, . . . , n, 

(3.4) lim = / y'^5fc(y)d^(y), V/3 G N^, 



Theorem 3.3. Lei K, Y he as 112. 2\) and fS. 1\) respectively, and let {hkYf^^i satisfy 
Assumption \3.1i Assume that for every y G Y the set Ky is nonempty, and for 
almost all y G Y, J(y) is attained at a unique optimal solution x*(y). Consider 
the semidefinite relaxations \3.S\) . Then: 

(a) Pi ^ p as i ^ oo. 

(b) Let z* be a nearly optimal solution of 13. e.g. such that L.^i{f) < pi + \/i, 
and let g : Y — > Ky he the measurable mapping in Theorem \2.2\f c}. Then 



Y 

where eik) = {5j=k)] e N". 

The proof is postponed to Section ID 

Remark 3.4. Observe that if pi = +oo for some index i in the hierarchy (and 
hence for aU i' > i), then the set Ky is empty for aU y in some Borel set of Y with 
tf{Y) > 0. Conversely, one may prove that if Ky is empty for all y in some Borel set 
of Y with (p{Y) > 0, then necessarily pi = +oo for all i sufficiently large. In other 
words, the hierarchy of semidefinite relaxations (|3.2p may also provide a certificate 
of emptyness of Kj, for some Borel set of Y with positive Lebesgue measure. 

3.3. The dual semidefinite relaxations. The dual of the semidefinite relaxtion 
reads: 

p* = sup pd(p 
s.t. f -p = crQ+ ELi 



(3.5) 

p G M[y]; CTj C I][x,y], j = l,...,t 
degp < 2i, degajhj <2i, j = 1, . . . ,t 

Observe that (|3.5|) is a strenghtening of ()2.9p as one restricts to polynomials p G 
of degree at most 2i and the nonnegativity of f—p in (12. 9p is replaced with a stronger 
requirement in p.Sp . Therefore p* < p* for every i. 

Theorem 3.5. Let K,Y be as \2. 2\) and \3. 1]) respectively, and let {hk)\^i satisfy 
Assumption \3.1\ Assume that for every y G Y the set Ky is nonempty, and 
consider the semidefinite relaxations \3.5]) . Then: 

(a) p* ] p as i ^ Qo. 

(b) Let {pi, (Cj)) be a nearly optimal solution of i3.5\) . e.g. such that JyPidip > 
p* — 1/i. Then pi < J(-) and 

(3.6) lim / (J(y)-p,;(y))d^(y) =0 

Moreover if one defines 

Po-^Po, y'-^K(y) := max[p,_i(y),p,;(y)], i = l,2,..., 
then Pi — > J(-) almost uniformly on Y. 
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Proof. Recall that by Lemnia [2^ p — p*. Moreover let {pk) C M[y] be a maximizing 
sequence of (|2.9p as in Corollary 12.51 with value Sk J Pkdip, and let pj. :— pk — 
1/k for every k so that / — pj. > l//c on K. By Theorem 13.21 there exist s.o.s. 
polynomials (cr|) C S[x, y] such that f — p'k = ""o + ""i^j- Letting dk be the 
maximum degree of ctq and (Jjhj, j = 1, ...,<, it follows that {sk — (c*^)) is a 
feasible solution of ()3.5p with i := dfc. Hence p* > p*^^ > Sk — \/k and the result 
(a) follows because Sk p*, and the sequence p* is monotone. Then (b) follows 
from Corollary [231 

□ 

Hence Theorem 13.51 provides a lower polynomial approximation pi £ M[y] of the 
optimal value function J(-). Its degree is bounded by 2i, the order of the moments 
(7^3) of (fi taken into account in the semidefinite relaxation (|3.5p . Moreover one may 
even define a piecewise polynomial lower approximation pi that converges almost 
uniformly to J(-) on Y. 

Functionals of the optimal solutions. Theorem 13.31 provides a mean of ap- 
proximating any polynomial functional on the optimal solutions of Py, y G Y. 
Indeed, 

Corollary 3.6. Lef K,Y be as and iS.l]) respectively, and let {hkYj^^i satisfy 

Assumption \3. 1[ Assume that for every y G Y the set Ky is nonempty, and for 
almost all y G Y, J(y) is attained at a unique optimal solution x*(y) G X* . Let 
h G R[x], 

X i-^ /i(x) := ha'x°' , 

and let z* be a nearly optimal solution of the semidefinite relaxations \3.2\) . 
Then, for i sufficiently large, 

I /i(x*(y))d(^(y) « '^"^"0- 

Proof. The proof is an immediate consequence of Theorem 13.31 and Corollary 12.31 

□ 

3.4. Persistence for Boolean variables. One interesting and potentially useful 
application is in Boolean optimization. Indeed suppose that for some subset / C 
{1, . . . , n}, the variables (xi), i € I, are boolean, that is, the definition of K in (|2.2p 
includes the constraints xf — Xi — 0, for every i ^ I. 

Then for instance, one might be interested to determine whether in an optimal 
solution x*(y) of Py, and for some index i G /, one has a;*(y) = 1 (or x*{y) = 0) 
for almost all values of the parameter y G Y. In [3l [17] the probability that xl{y) 
is 1 is called the persistency of the boolean variable x^ (y) 

Corollary 3.7. Let K, Y be as in 112. 2\} and IjS.l]) respectively. Let (/ifc)fe=i satisfy 
113.1]) . Assume that for every y G Y the set Ky is nonempty. Let be a nearly 
optimal solution of the semidefinite relaxations 113.2^) . Then for k £ I fixed. 

(a) a;^(y) = 1 for almost all y G Y, only if lim Zg/^-XQ = 1. 

(b) a:^(y) — for almost all y G Y, only if lim z],,i^sq = 0. 

Assume that for almost all y G Y, J(y) is attained at a unique optimal solution 
x*(y) G X* Then Prob(a::t(y) = 1) = lim zI.,j,-,q, and so: 
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(c) a;I(y) = 1 for almost all y G Y, if and only if lim 2*fj,NQ = 1. 

(d) a;^(y) — for almost all y G Y, if and only if lim zl/f,-,^ = 0. 

Proof, (a) The only if part. Let a :— e{k) G N". From the proof of Theorem 13. 3[ 
there is a subsequence C such that 

lim z^\,,„ — I Xkdu*. 

where fi* is an optimal solution of P. Hence, by Theorem l2.2f b). /i* can be disinte- 
grated into il^*{dx.\y)dip{y) where V'*('|y) is a probability measure on X* for every 
y G Y. Therefore, 




!(fc)o = J^yJ^^Xkip*{d:>i\y)jdip{y), 

= j I y) dip{y) [because xl = 1 in X*] 

dip{y) = 1, 



and as the subsequence was arbitrary, the whole sequence (2e(j.)o) converges 
to 1, the desired result. The proof of (b) being exactly the same is omitted. 
Next, if for every y G Y, J(y) is attained at a singleton, by Theorem 13. Sf b). 



1™ <.(fe)o = / ^*k{y)My) = ^{{y ■ 4(y) = i}) 

- Prob(4(y) = l), 
from which (c) and (d) follow. □ 

3.5. Estimating the density g{y). By Corollarv l3.6[ one may approximate any 
polynomial functional of the optimal solutions, like for instance the mean, variance, 
etc .. (with respect to the probability measure i^). However, one may also wish to 
approximate (in some sense) the "curve" y i-^ gkiy), that is, the surface described 
by the fc-th coordinate x^(y) of the optimal solution x*(y) when y varies in Y. 

So let g : Y ^ R" be the measurable mapping in Theorem 13 . 31 and suppose that 
one knows some lower bound vector a — (ofc) G M", where: 

a-k < inf { a;fe : (x, y) G K }, fc = 1, . . . , n. 

Then for every fc = 1, . . . , n, the measurable function % : Y ^ R" defined by 

(3.7) y .gfe(y) := .9fc(y) - flfc, y e Y, 

is nonnegative and integrable with respect to ip. 

Hence for every fc = 1, . . . , n, one may consider dX := gtdx as a Borel measure on 
Y with unknown density gk with respect to (p, but with known moments u = (w/j). 
Indeed, using p.4p . 

up := / y^dX{y) = -Ok I y^ d^{y) + j y^ gk{y) d^{y) 

(3.8) = -aki,3 + Ze(fe)/3, V/3 G W, 
where for every k = 1, . . . , n. 
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with z' being an optimal (or nearly optimal) solution of the semidefinite relaxation 
(1321). 

Hence we are now faced with a density estimation problem, that is: Given the 
sequence of moments 7^3 = J^y^9k{y)dip, (3 G of the unknown nonnegative 
measurable function on Y, "estimate" gu- One possibility is the so-called maxi- 
mum entropy approach, briefly described in the next section. 

MEiximum-entropy estimation. We briefly describe the maximum entropy esti- 
mation technique in the univariate case. The multivariate case generalizes easily. 
Let g G ii([0, 1] jl be a nonnegative function only known via the first 2d -\- 1 mo- 
ments u = (uj)^£o of its associated measure dip = gdx on [0, 1]. (In the context of 
previous section, the function g to estimate is y 1— > 5fc(y) in (j3.7p from the sequence 
u in (|3.8[) of its (multivariate) moments.) 

From that partial knowledge one wishes (a) to provide an estimate hd of g such 
that the first 2d + 1 moments of the measure hddx match those of gd'x., and (b) 
analyze the asymptotic behavior of hd when d —^ 00. This problem has impor- 
tant applications in various areas of physics, engineering, and signal processing in 
particular. 

An elegant methodology is to search for hd in a (finitely) parametrized family 
{hd{X, x)} of functions, and optimize over the unknown parameters A via a suitable 
criterion. For instance, one may wish to select an estimate hd that maximizes some 
appropriate entropy. Several choices of entropy functional are possible as long as 
one obtains a convex optimization problem in the finitely many coefficients Aj's. 
For more details the interested reader is referred to e.g. Borwein and Lewis [6l [7] 
and the many references therein. 

We here choose the Boltzmann-Shannon entropy H : Li([0, 1]) ^ M U {—00}: 

(3.9) h^ n[h] := - [ h{x) \nh{x)dx, 

Jo 

a strictly concave functional. Therefore, the problem reduces to: 

(3.10) sup ( H[h] : [ x^ h{x) dx = u^, j = 0,...,2d 

The structure of this infinite-dimensional convex optimization problem permits to 
search for an optimal solution h*^ of the form: 

2d 

(3.11) x ^ h*^{x) — exp A* x^ , 

and so A* is an optimal solution of the finite-dimensional unconstrained convex 
problem 

XjX'' dx. 



„i / 2d 
(u) := sup (u. A) - / exp V . 



Notice that the above function 9 is just the Legendre-Fenchel transform of the 
convex function A n- Jg exp X]j=o ^j^'' 



^-Li([0, 1]) denote the Banach space of integrable functions on the interval [0, 1] of the real line, 
equipped with the norm \\g\\i = Jq \b{x)\ dx. 
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An optimal solution can be calculated by applying first-order methods, in which 
case the gradient Vwd of the function 

1 / 2d \ 

X ^ Vd{X) ■= (u, A) — / exp XjX^ dx, 
^° \j=o J 



Uk - X exp > XjX^ \ dx, k ^ 0, . . . ,2d + 1. 



is provided by: 

dvdjX) 
dXk 

If one applies second-order methods, e.g. Newton's method, then computing the 
Hessian V^Wd at current iterate A, reduces to computing 




d^VdjX) 
dXkdXj 




k,j ^0,...,2d+l. 



In such simple cases like a box [a, b] (or [a, 6]" in the multivariate case) such quan- 
tities can be approximated quite accurately via cubature formula as described in 
e.g. [5]. In particular, several cubature formula behave very well for exponentials 
of polynomials as shown in e.g. Bender et al. [5]. An alternative with no cubature 
formula is also proposed in |15| . 

One has the following convergence result which follows directly from [6, Theor. 
1.7 and p. 259]. 

Proposition 3.8. Let < g e Li{[0, 1]) and for every d G N, let h*^ in SS.ll]) be 
an optimal solution of iS.lO]) . Then, as d ^ oo, 

4>{y){K{y)-9{y))dx ^ o, 



for every bounded measurable function : [0, 1] — > K which is continuous almost 
everywhere 

Hence, the max-entropy estimate we obtain is not a pointwise estimate of g, and 
so, at some points of [0, 1] the max-entropy density h*^ and the density g to estimate 
may differ significantly. However, for sufficiently large d, both curves of /ij and g 
are close to each other. In our context, recall that g is for instance y i— s- x\{y), 
and so in general, for fixed y, h*^{y) is close to x\{y) and might be chosen for the 
/c-coordinate of an initial point x, input of a local minimization algorithm to find 
the global minimizcr x*(?/). 

3.6. Illustrative examples. In this section we provide some simple illustrative 
examples. To show the potential of the approach we have voluntarily chosen very 
simple examples for which one knows the solutions exactly so as to compare the 
results we obtain with the exact optimal value and optimal solutions. The semidef- 
inite relaxations (j3.2p were implemented by using the software package Gloptipoly 
[lOj . The max-entropy estimate h*^ of g^ was computed by using Newton's method, 
where at each iterate {X''^\hd{X^''^)): 
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Example 1. For illustration purpose, consider the toy example where Y [0, 1], 

K:^{{x,y) : 1 - + > 0; x,y e Y} C R', {x, y) ^ /(x, y) ~x^y. 

Hence for each value of the parameter y G Y, the unique optimal solution is x* (y) :— 
\/l — y^. And so in Theorem I3.3r b) . y ^ g{y) = \/l — y^. 

Let if be the probability measure uniformly distributed on [0, 1]. Therefore, 

p= [ J{y)d^{y) = - I y{l-y^)dy = -1/4. 



Solving (|3.2p with i :— 3, that is, with moments up to order 6, one obtains the 
optimal value —0.250146. Solving (|3.2p with i := 4, one obtains the optimal value 
—0.25001786 and the moment sequence 

z = (1, 0.7812, 0.5, 0.6604, 0.3334, 0.3333, 0.5813, 0.25, 0.1964, 0.25, 0.5244, 0.2, 0.1333, 
0.1334, 0.2, 0.4810, 0.1667, 0.0981, 0.0833, 0.0983, 0.1667) 

Observe that 



Zlk 



z/v/i-y2dy«o(io-f^), fc = 0,...4. 



zik~ yW'^-y^ dy^O{lQ-^), fc = 5,6,7. 



Using a max-entropy approach to approximate the density y i-^ gly) on [0, 1], with 
the first 5 moments zik, fc = 0, . . . , 4, we find that the optimal function /i| in (|3.11[) 
is obtained with 

A* (-0.1564,2.5316,-12.2194,20.3835,-12.1867). 

Both curves of g and /i| are displayed in Figure [TJ Observe that with only 5 
moments, the max-entropy solution approximates g relatively well, even if it 
differs significantly at some points. Indeed, the shape of resembles very much 
that of g. 

Finally, from an optimal solution of l|3.5p one obtains for p e M.[y], the degree-8 
univariate polynomial 

y ^ p(y) = -0.0004 - 0.9909y- 0.0876y^ + 1.4364t/^- 1.2481/ 
+2.1261y^ - 2.1309/ + 1.1593y^ - 0.2641/ 

and Figure [3 displays the curve y i— > J{y) ~ p{y) on [0, 1]. One observes that J > p 
and the maximum difference is about 3.10""* close to and much less for y > 0.1, 
a good precision with only 8 moments. 

Example 2. Again with Y := [0, 1], let 

K := {(x, y) : l-xl-xl>0}c M^, (x, y) ^ /(x, / := yxi + (1 - y)x2. 
For each value of the parameter y G Y, the unique optimal solution x* G K satisfies 

ixliy)r + {xUy))'^i; (xUy)?- , , f i^Uy)? - 



with optimal value 

Jiy) = , - , , ^ -v// + (l-/2. 

V/ + (1 - y)' // + (1 - yy 
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and with Lp being the probabiHty measure uniformly distributed on [0, 1], 

p= ( J{v)d^{y) = - [ v/y' + (1 " dy « -0.81162 
Jo Jo 

Solving (j3.2p with i :~ 3, that is, with moments up to order 6, one obtains pa ~ 

-0.8117 with P3 - p « 0(10"^). Solving ((3?2| with i := 4, one obtains p4 « 

—0.81162 with p4 — p « 0(10^^), and the moment sequence (zkio), A: = 0, 1, 2, 3, 4: 

zkio = (-0.6232, -0.4058, -0.2971, -0.2328, -0.1907), 

and ^ 

Zkio - I V^'gMdy « 0(10-^), fc = o,. 



,4. 

Using a max-entropy approach to approximate the density y i— > —gi{y) on [0, 1], 
with the first 5 moments zik, fc = 0, . . . , 4, we find that the optimal function in 
p. lip is obtained with 

A* ^ (-3.61284,15.66153266- 29.43090127.326347- 9.9884452). 

and we find that 

Zkio + t / K{y) dy « 0(10-"), A: = 0, . . . , 4. 



In Figure [3] are displayed the two functions ~gi and and one observes a very 
good concordance. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Figure 3. Example O hX{y) versus -gi{y) = vl \J']p' + (1 - vY 



Finally, from an optimal solution of (|3.5p one obtains for -p £ M[2/], the degree-8 
univariate polynomial 

X ^ p{y) := -1.0000 + 0.9983?; - 0.45377/^ - 0.9941z/^ + 2.2488?/^ - 7.6739y^ 
+ 11.8448y^ - 7.9606y^ + 1.9903?/^ 
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and Figure [4] displays the curve y ^ J{y) ~p{y) on [0, 1]. One observes that J > p 
and the maximum difference is about fO^*, a good precision with only 8 moments. 




Figure 4. Example^ J{y) - p{y) on [0,1] 

Example 3. In this example one has Y — [0, 1], (x, y) ^ f{x,y) :— yxi + {l — y)x2, 
and 

K := {(x, y) : yxf + xj - y <= 0; xj + yx^ ~ y <= 0}. 
That is, for each t/ G Y the set Ky is the intersection of two ellipsoids. It is easy 
to chack that 1 + x*{y) > for all y G Y, i := 1, 1. With i = 4 the max-entropy 
estimate y i— > /i|(y) for 1 + x\{ii) is obtained with 

A* = (-0.2894,1.7192,-19.8381,36.8285,-18.4828), 

whereas the max-entropy estimate y ^ h\{y) for 1 + x^iy) is obtained with 

A* = (-0.1018,-3.0928,4.4068,1.7096,-7.5782). 

Figure [5] displays the curves of x\{y) and Xjy), as well as the constraint hi{x*{y), y). 
Observe that hi{x*{y),y) w on [0, 1] which means that for almost all y £ [0, 1], 
at an optimal solution x*(y), the constraint /ii < is saturated. Figure [5] displays 
the curves of /ii(x*(j/), y) and /i2(x*(?/), y). 

Example 4. This time Y = [0,1], (x, y) ^ /(x, y) := (1 — 2y)(xi +X2), and 

K := {(x, y) : yx\ + x^ - y = 0; xj + yx'^ - y ^ 0}. 
That is, for each y G Y the set Ky is the intersection of two ellipses, and 

With i ~ A the max-entropy estimate y i— > h^y) for 1 + xl{y) is obtained with 
A* = (0.3071151,-12.51867,43.215907,-46.985733,16.395944). 
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0.2 - 




Figure 5. Example [31 X2{y) and hi{x*{y),y) on [0,1] 



0.1 




Figure 6. Example [3) /ii(x*(y),y) and /i2(x*(y),y) on [0,1] 



In Figure [7] are displayed the curves y i-+ —p{y) and y i— > —J{y), whereas in Figure 
m is displayed the curve y i—>- p{y) — J{y). One may see that p is a good lower 
approximation of J even with only 8 moments. 
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1.5 




Figure 8. Example [H the curve p{y) - J{y) on [0, 1] 



On the other hand, in Figure [H is displayed h\{y) versus x\{y) where the latter 
is — •\/y/(l + y) on [0, 1/2] and \/y/il + y) on [1/2, 1]. Here we see that the dis- 
continuity of x\{y) is difficult to approximate "pointwise" with few moments, and 




Figure 9. Example HI hl{y) - 1 and xl{y) on [0, 1] 



We end up this section with the case where the density gk to estimate is a step 
function which would be the case in an optimization problem with boolean 
variables (e.g. the variable Xk takes values in {0, 1}). 



y ^ 9k{y) ■■= 



Example 5. Assume that with a single parameter y e [0,1], the density gt to 
estimate is the step function. 

1 if y e [0,1/3] U [2/3, 1] 
otherwise. 

The max-entropy estimate h\ in (|3.1ip with 5 moments is obtained with 

A* = [-0.6547367219.170724- 115.39354192.4493171655- 96.226948865], 
and we have 

1 pi 

k 



y^Kiy) dy- / dgk{y) « O(10-«), fc = 0, . . . , 4. 

JO 

In particular, the persistency Jq gk{y)dy = 2/3 of the variable a;^(?/), is very well 
approximated (up to 10~^ precision) by / h\{y)dy, with only 5 moments. 

Of course, in this case and with only 5 moments, the density h\ is not a good 
pointwise approximation of the step function g^; however its "shape" reveals the two 
steps of value 1 separated by a step of value 0. A better pointwise approximation 
would require more moments. 
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Figure 10. ExampleEl gk{y) = l[o,i/3]u[2/3a] versus hl{y) 
4. Appendix 

Proof of Theorem [3731 

We already know that Pi < p for all i > iq. We also need to prove that pt > —oo 
for sufficiently large i. Let Q C R[x,y] be the quadratic module generated by the 
polynomials {hj} C ]R[x, y] that define K, i.e., 

t 

Q := { CT e M[x, y] : a ^ uo + ^Uj hj with {(TjYj^o C E[x, y]}. 

i=i 

In addition, let Q{1) C Q he the set of elements a € Q which have a representation 
iTo+X]*=o ^3 ^3 some s.o.s. family {crj} C with deg cto < 2^ and deg djhj < 21 
for all j = 1 , . . . , t. 

Let i g N be fixed. As K is compact, there exists TV such that N ± x'^y'' > 
on K, for all a S N" and (3 € N^, with |a + /9| < 2i. Therefore, under Assumption 
I3.ir ii). the polynomial A^ix'^y'' belongs to Q; see Putinar [18^. But there is even 
some l{i) such that N ± x^y'^ G Q(l{i)) for every |a + /3| < 2i. Of course we also 
have N ± x'^y'' 6 Q{1) for every |a + /3| < 2z, whenever I > Therefore, let us 
take > io. For every feasible solution z of Q;(i) one has 

ka/3| - |iz(xV) I <A^, y\a + (3\<2t. 
This follows from zq — 1, M;(j)(z) ^ and M;(j)_„^. (/ij z) >^ 0, which implies 

t 

Nzo ± z^p = L,(A ± x"y'3) = L,(ao) + ^ /i^) > 



24 



JEAN B. LASSERRE 



for some {aj} C S[x, y] withdegfjj hj < 2l{i). In particular, Lz(/) > ^ I /a/3 1 j 

which proves that > — oo, and so pi > —oo for all sufficiently large i. 

From what precedes, and with fc e N arbitrary, let l{k) > k and N^. be such that 

(4.1) Nk ± x"y'3 e Q(Z(fc)) Va e W\ l3 eW with |a + /3| < 2fc. 
Let i > and let be a nearly optimal solution of (|3.2p with value 

(4.2) < L,.(/) < p, + - (<p+- 

i \ « 

Fix k €N. Notice that from (|4.ip . for every z > ^(fc), one has 

I L^.(x"y'3) I < Nkzo = iVfe, Va G N",/3 G with |a + /3| < 2k. 
Therefore, for all i > l{iQ), 

(4.3) Iz^^l = |L,.(x"y^)| < N'k, VaeN",/3GNP with|a + /3| < 2k, 
where Nj^ — max[iVi;, Vk], with 

Ffe :=max{ |z^^| : |a + /3| < 2fc ; Kio) < i < ^k) }. 

Complete each vector z' with zeros to make it an infinite bounded sequence in 
Zoo 7 indexed in the canonical basis (x'^y'^) of R[x,y]. In view of (|4.3p . 

(4.4) Kfi\<N'k VaeN",/?eNP with 2k - 1 < \a + (3\ < 2k, 

and for all fc = 1, 2, . . .. 

Hence, let z' g loo be the new sequence defined by 

2;^:=-^, Va e N",/3 G with 2fc - 1 < |a + /3| < 2fc, Vfc = l,2,..., 

and in l^o, consider the sequence {z*}^, as i — > oo. 

Obviously, the sequence {z*}^ is in the unit ball Bi of loo, and so, by the Banach- 
Alaoglu theorem (see e.g. Ash [2]), there exists z £ Bi, and a subsequence {ii}, 
such that z*' ^ z as Z ^ oo, for the weak ★ topology a{loo, h) of loo- In particular, 
pointwise convergence holds, that is, 

lim z!' z^n Va e N",/5 e W. 

Next, define 

Zap := Zap Nl Va e N",/5 e with 2fc - 1 < |a + /3| < 2fc, Vfc = l,2,... 
The pointwise convergence z^' ^ y implies the pointwise convergence z*' — > z, i.e., 

(4.5) lim z^g ^ Zaf3 Va e N", /? £ . 

Next, let s G N be fixed. From the pointwise convergence (|4.5p we deduce that 

lim M,(z*') ^ M^(z) h 0. 

^ — >oo 

Similarly 

lim Ms{hj z") = Ms{hj z) ^ 0, j = 1, . . . ,i. 

As s was arbitrary, we obtain 

(4.6) M,(y)^0; M,(/^, z) ^ 0, j = l,...,t; s = 0,l,2,.... 
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which by Theorem 13.21 impUcs that z is the sequence of moments of some finite 
measure /i* with support contained in K. Moreover, the pointwise convergence 
(|4.5p also imphes that 

(4.7) / y^d^iy) = 7^ = hm z}; = zo/3 = / y^^A**, V/3 G W. 

As measures on compacts sets are determinate, (I4.7P imphes that the marginal of fi* 
on W is the probability measure if, and so /i* is feasible for P. Finally, combining 
the pointwise convergence (|4.5p with (|4.2p yields 

p > lim Pj, = lim L^^,{f) = L^{f) = / f dfi* , 

L — >oo I — >oo 

which in turn yields that /z* is an optimal solution of P. And so pi, — > p as ^ — > oo. 
As the sequence [pi) is monotone this yields the desired result (a). 

(b) Next, let a G N" and /3 G be fixed, arbitrary. From (|4.5p . we have: 

lim 2;^' = Zap = / x^y/^dfi*, 

and by Theorem 12. 2f c') 

hm z^' = / x^y'^dM* = / y^giyrdipiy), 

and as the converging subsequence was arbitrary, the above convergence holds for 
the whole sequence (z^^). □ 
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